Marco Chierici
June 1, 2022
(partially abridged from The Python Graph Gallery, Plotly doc, Folium doc)
We give here a short and partial overview of three main methods to deal with geospatial mapping in Python, with a focus on choropleth maps:
conda install -c conda-forge geopandas, conda install -c conda-forge geoplotconda install -c plotly plotly=5.8.0conda install -c conda-forge foliumimport numpy as np
import pandas as pd
import matplotlib as mpl
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
# don't display warning messages
warnings.filterwarnings(action='ignore')
# show plots inline and make them interactive
%matplotlib inline
A map is a set of polygon coordinates displayed on a 2D canvas. So, first you need to get those polygons! Common sources are:
geopandas is the preferred way to import these files into Pythongeopandas again, or basemap) with the information for common locations or regions (Europe, USA, etc.)Now that you have a set of polygons, you can plot it using different approaches. For example:
geoplot, if you have your data into a geopandas' geodataframe;plotly, if you want or need an interactive map from a geodataframe;folium, if you want to use Google Map styled maps.A choropleth map is a map composed of colored polygons, conveying spatial variations of a quantity. To create them, you'll need:
GeoPandas extends the datatypes used by pandas to support spatial operations on geometric types.
We start by importing a geoJSON file with the state boundaries of France.
import geopandas as gpd
data = gpd.read_file("https://raw.githubusercontent.com/holtzy/The-Python-Graph-Gallery/master/static/data/france.geojson")
data.crs
<Geographic 2D CRS: EPSG:4326> Name: WGS 84 Axis Info [ellipsoidal]: - Lat[north]: Geodetic latitude (degree) - Lon[east]: Geodetic longitude (degree) Area of Use: - name: World. - bounds: (-180.0, -90.0, 180.0, 90.0) Datum: World Geodetic System 1984 ensemble - Ellipsoid: WGS 84 - Prime Meridian: Greenwich
data is a geodataframe, where each row represents an item in the geoJSON file - in this case, a region of France. The columns describe the features of each region: in particular, the geometry column stores the coordinates of the region boundary.
print(type(data))
<class 'geopandas.geodataframe.GeoDataFrame'>
The most common and straightforward way to plot a map from a geopandas dataframe is with geoplot.
import geoplot as gplt
import geoplot.crs as gcrs # load projections
gplt.polyplot(data, projection=gcrs.AlbersEqualArea(),
edgecolor='darkgrey', facecolor='lightgrey',
linewidth=.3,
figsize=(12, 8))
plt.show()
Now that we have learned how to draw a map combining GeoPandas and geoplot, let's create a choropleth map to visualize the unemployment rate of each US county.
# Load the json file with county coordinates
geoData = gpd.read_file('https://raw.githubusercontent.com/holtzy/The-Python-Graph-Gallery/master/static/data/US-counties.geojson')
# Make sure the "id" column is an integer
geoData.id = geoData.id.astype(str).astype(int)
geoData
| id | GEO_ID | STATE | COUNTY | NAME | LSAD | CENSUSAREA | geometry | |
|---|---|---|---|---|---|---|---|---|
| 0 | 1001 | 0500000US01001 | 01 | 001 | Autauga | County | 594.436 | POLYGON ((-86.49677 32.34444, -86.71790 32.402... |
| 1 | 1009 | 0500000US01009 | 01 | 009 | Blount | County | 644.776 | POLYGON ((-86.57780 33.76532, -86.75914 33.840... |
| 2 | 1017 | 0500000US01017 | 01 | 017 | Chambers | County | 596.531 | POLYGON ((-85.18413 32.87053, -85.12342 32.772... |
| 3 | 1021 | 0500000US01021 | 01 | 021 | Chilton | County | 692.854 | POLYGON ((-86.51734 33.02057, -86.51596 32.929... |
| 4 | 1033 | 0500000US01033 | 01 | 033 | Colbert | County | 592.619 | POLYGON ((-88.13999 34.58170, -88.13925 34.587... |
| ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 3216 | 51001 | 0500000US51001 | 51 | 001 | Accomack | County | 449.496 | MULTIPOLYGON (((-75.24227 38.02721, -75.29687 ... |
| 3217 | 51021 | 0500000US51021 | 51 | 021 | Bland | County | 357.725 | POLYGON ((-81.22510 37.23487, -81.20477 37.243... |
| 3218 | 51027 | 0500000US51027 | 51 | 027 | Buchanan | County | 502.763 | POLYGON ((-81.96830 37.53780, -81.92787 37.512... |
| 3219 | 51037 | 0500000US51037 | 51 | 037 | Charlotte | County | 475.271 | POLYGON ((-78.44332 37.07940, -78.49303 36.891... |
| 3220 | 51041 | 0500000US51041 | 51 | 041 | Chesterfield | County | 423.297 | POLYGON ((-77.85180 37.35487, -77.85515 37.418... |
3221 rows × 8 columns
# Remove Alaska, Hawaii and Puerto Rico.
stateToRemove = ['02', '15', '72']
geoData = geoData[~geoData.STATE.isin(stateToRemove)]
# Basic plot with just county outlines
gplt.polyplot(geoData, figsize=(20, 4))
plt.show()
To build a choropleth map, we need numeric data to link to the map: we load the unemployment rate of US counties from the Bureau of Labor Statistics (source).
data = pd.read_csv('https://raw.githubusercontent.com/holtzy/The-Python-Graph-Gallery/master/static/data/unemployment-x.csv')
data.head()
| id | state | county | rate | |
|---|---|---|---|---|
| 0 | 1001 | Alabama | Autauga County | 5.1 |
| 1 | 1003 | Alabama | Baldwin County | 4.9 |
| 2 | 1005 | Alabama | Barbour County | 8.6 |
| 3 | 1007 | Alabama | Bibb County | 6.2 |
| 4 | 1009 | Alabama | Blount County | 5.1 |
# plot a histogram of unemployment rate
sns.distplot(data["rate"], hist=True, kde=False, rug=False );
Now we can merge both sources of data (map and unemployment rates) in order to visualize them together. We'll be using the merge() function of geopandas:
fullData = geoData.merge(data, left_on=['id'], right_on=['id'])
fullData.head()
| id | GEO_ID | STATE | COUNTY | NAME | LSAD | CENSUSAREA | geometry | state | county | rate | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1001 | 0500000US01001 | 01 | 001 | Autauga | County | 594.436 | POLYGON ((-86.49677 32.34444, -86.71790 32.402... | Alabama | Autauga County | 5.1 |
| 1 | 1009 | 0500000US01009 | 01 | 009 | Blount | County | 644.776 | POLYGON ((-86.57780 33.76532, -86.75914 33.840... | Alabama | Blount County | 5.1 |
| 2 | 1017 | 0500000US01017 | 01 | 017 | Chambers | County | 596.531 | POLYGON ((-85.18413 32.87053, -85.12342 32.772... | Alabama | Chambers County | 5.0 |
| 3 | 1021 | 0500000US01021 | 01 | 021 | Chilton | County | 692.854 | POLYGON ((-86.51734 33.02057, -86.51596 32.929... | Alabama | Chilton County | 5.2 |
| 4 | 1033 | 0500000US01033 | 01 | 033 | Colbert | County | 592.619 | POLYGON ((-88.13999 34.58170, -88.13925 34.587... | Alabama | Colbert County | 6.5 |
We are happy that geoplot provides a choropleth() function!
choropleth(df, projection=None, hue=None, cmap=None, scheme=None)
The hue parameter expects the name of the column we want to use to control the color of each county. Then, we have to pick a suitable color palette (cmap) and a binning scheme (scheme). Geoplot comes with both continuous and categorical binning schemes, i.e. methods that split a sequence of observations into a number of bins.
fig, ax = plt.subplots(1, 1, figsize=(16, 12))
# set up the binning sheme
import mapclassify as mc
scheme = mc.Quantiles(fullData['rate'], k=10)
# map
gplt.choropleth(fullData,
hue="rate",
linewidth=.1,
scheme=scheme, cmap='inferno_r',
legend=True,
edgecolor='black',
ax=ax
)
ax.set_title('Unemployment rate in US counties', fontsize=13)
Text(0.5, 1.0, 'Unemployment rate in US counties')
Plotly is a open-source graphing library aimed at interactivity for Python, R, and other languages. It can be used offline and does not require any account registration (Plotly.com has also paid licenses for enterprise users).
We'll create a choropleth map of the US unemployment rate with Plotly, this time using a slightly different approach for gathering the data.
We load a GeoJSON file with the geometry information for US counties:
from urllib.request import urlopen
import json
with urlopen('https://raw.githubusercontent.com/plotly/datasets/master/geojson-counties-fips.json') as response:
counties = json.load(response)
counties["features"][0]
{'type': 'Feature',
'properties': {'GEO_ID': '0500000US01001',
'STATE': '01',
'COUNTY': '001',
'NAME': 'Autauga',
'LSAD': 'County',
'CENSUSAREA': 594.436},
'geometry': {'type': 'Polygon',
'coordinates': [[[-86.496774, 32.344437],
[-86.717897, 32.402814],
[-86.814912, 32.340803],
[-86.890581, 32.502974],
[-86.917595, 32.664169],
[-86.71339, 32.661732],
[-86.714219, 32.705694],
[-86.413116, 32.707386],
[-86.411172, 32.409937],
[-86.496774, 32.344437]]]},
'id': '01001'}
Each feature.id is a FIPS code (see one of our previous R labs).
Next, we load unemployment data by county, also indexed by FIPS code.
df = pd.read_csv("https://raw.githubusercontent.com/plotly/datasets/master/fips-unemp-16.csv",
dtype={"fips": str})
df.head()
| fips | unemp | |
|---|---|---|
| 0 | 01001 | 5.3 |
| 1 | 01003 | 5.4 |
| 2 | 01005 | 8.6 |
| 3 | 01007 | 6.6 |
| 4 | 01009 | 5.5 |
We are ready for the plotting. We'll use Plotly express, Plotly's high-level API for creating figures. As it was for geoplot, we can use a very convenient choropleth() function.
import plotly.express as px
fig = px.choropleth(df, geojson=counties,
locations='fips',
color='unemp',
color_continuous_scale="Viridis",
range_color=(0, 12),
scope="usa",
labels={'unemp': 'unemployment rate'}
)
fig.update_layout(margin={"r":0, "t":0, "l":0, "b":0})
fig.show()
import plotly.express as px
fig = px.choropleth(df, geojson=counties,
locations='fips',
color='unemp',
color_continuous_scale="Viridis",
range_color=(0, 12),
scope="usa",
labels={'unemp': 'unemployment rate'}
)
fig.update_layout(margin={"r":0, "t":0, "l":0, "b":0})
# better legend
fig.update_layout(coloraxis_colorbar=dict(
thicknessmode="pixels", thickness=10,
lenmode="pixels", len=150,
yanchor="top", y=0.8,
ticks="outside", ticksuffix=" %",
dtick=5
))
fig.show()
Especially if you work outside Jupyter notebooks, you can save the output plot as a standalone HTML file:
fig.write_html("plotly_choropleth_unemp.html")
You can then embed the HTML map in any HTML document using an iframe:
<iframe src="plotly_choropleth_unemp.html" title="Plotly choropleth map" style={{ border: "none", width: "800px", height: "300px" }}></iframe>
Folium is a Python wrapper of the leaflet.js JavaScript library, putting together the best of the two worlds: Python's data processing capabilities and JavaScript's interactive data visualization.
With Folium we can create a map with just one line of code (two if you count the import):
import folium
map = folium.Map(location=[47.6151, -122.3413], zoom_start=13)
map
As you see, by default Folium uses OpenStreetMap as tile provider.
The map can be saved to a standalone HTML file:
map.save("folium_basic_map.html")
You can use several options for the background tile using the tiles parameter. For example, let's take a look at Paris with a Stamen Toner or Stamen Terrain tile:
tonerMap = folium.Map(location=[48.85, 2.35], tiles="Stamen Toner", zoom_start=10)
tonerMap
tonerMap = folium.Map(location=[48.85, 2.35], tiles="Stamen Terrain", zoom_start=10)
tonerMap
Adding markers to a Folium map is as easy as creating a Pandas dataframe storing their coordinates.
# Make an empty map
m = folium.Map(location=[20, 0], zoom_start=2)
m
# Make a data frame with dots to show on the map
data = pd.DataFrame({
'lon':[-58, 2, 145, 30.32, -4.03, -73.57, 36.82, -38.5],
'lat':[-34, 49, -38, 59.93, 5.33, 45.52, -1.29, -12.97],
'name':['Buenos Aires', 'Paris', 'melbourne', 'St Petersbourg', 'Abidjan', 'Montreal', 'Nairobi', 'Salvador'],
'value':[10, 12, 40, 70, 23, 43, 100, 43]
}, dtype=str)
data
| lon | lat | name | value | |
|---|---|---|---|---|
| 0 | -58 | -34 | Buenos Aires | 10 |
| 1 | 2 | 49 | Paris | 12 |
| 2 | 145 | -38 | melbourne | 40 |
| 3 | 30.32 | 59.93 | St Petersbourg | 70 |
| 4 | -4.03 | 5.33 | Abidjan | 23 |
| 5 | -73.57 | 45.52 | Montreal | 43 |
| 6 | 36.82 | -1.29 | Nairobi | 100 |
| 7 | -38.5 | -12.97 | Salvador | 43 |
# add marker one by one on the map
for i in range(0,len(data)):
folium.Marker(
location=[data.iloc[i]['lat'], data.iloc[i]['lon']],
popup=data.iloc[i]['name']
).add_to(m)
# Show the map again
m
Time to create a choropleth map of the US unemployment rate with Folium. First, we grab the data: note that this time we only need a valid path / URL for the geoJSON with geometry and we do not need to read it in.
# geometry data
url = "https://raw.githubusercontent.com/python-visualization/folium/master/examples/data"
state_geo = f"{url}/us-states.json"
# unemployment data
state_unemployment = f"{url}/US_Unemployment_Oct2012.csv"
state_data = pd.read_csv(state_unemployment)
Next, we initialize a map setting tile and location:
m = folium.Map(location=[40, -95], zoom_start=4)
m
Finally, we create a folium.Choropleth object and we add it to our map:
folium.Choropleth(
geo_data=state_geo,
name="choropleth",
data=state_data,
columns=["State", "Unemployment"],
key_on="feature.id",
fill_color="YlGn",
fill_opacity=0.7,
line_opacity=.1,
legend_name="Unemployment Rate (%)",
).add_to(m)
folium.LayerControl().add_to(m)
m